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Using the Maximum X-ray Flux Ratio and X-ray Background to 
Predict Solar Flare Class 

L.M. Winter and K. Balasubramaniam 

Abstract. 

We present the discovery of a relationship between the maximum ratio of the flare flux 
(namely, 0.5-4 A to the 1-8 A flux) and non-flare background (namely, the 1-8 A back¬ 
ground flux), which clearly separates flares into classes by peak flux level. We established 
this relationship based on an analysis of the Geostationary Operational Environmental 
Satellites (GOES) X-ray observations of ~ 50,000 X, M, G, and B flares derived from 
the NOAA/SWPC flares catalog. Employing a combination of machine learning tech¬ 
niques (K-nearest neighbors and nearest-centroid algorithms) we show a separation of 
the observed parameters for the different peak flaring energies. This analysis is validated 
by successfully predicting the flare classes for 100% of the X-class flares, 76% of the M- 
class flares, 80% of the G-class flares and 81% of the B-class flares for solar cycle 24, based 
on the training of the parametric extracts for solar flares in cycles 22-23. 


1. Introduction 

Solar flares release intense amounts of energy into the 
interplanetary medium. A statistical concept of how this 
energy is released is through an avalanche model [Lu and 
Hamilton, 1991; Lu et ai, 1993; Lu, 1995]. According 
to common concensus, the creation of a solar flare begins 
with magnetohydrodynamic instabilities that release energy 
stored in the local magnetic held lines through an untwist¬ 
ing of the field lines. Conservation of magnetic energy leads 
to instabilities in nearby regions through an avalanche pro¬ 
cess that accelerates energetic particles along the large-scale 
magnetic held lines. Soft X-rays in the solar corona, which 
are the topic of the presented analysis, are emitted from the 
associated magnetic loops, which are, in turn, connected to 
the active region surface magnetic fields, as seen in the pho¬ 
tosphere. In this model, all flares are the result of the same 
physical processes and the ultimate strength of the flare is 
related to the cascaded number of reconnection events cre¬ 
ating the flare. 

While the basic mechanism of solar flares is believed to 
be understood, the detailed physical processes are still too 
complex to be modeled in a deterministic way for predic¬ 
tions of when a flare will occur. This is a challenge, since the 
space weather effects of the energy release can lead to diverse 
problems to human technology such as satellite damage or 
inoperability, ionospheric communication interference, and 
power grid failures. Therefore, understanding the solar con¬ 
ditions (e.g., magnetic activity, coronal temperature) that 
precede and lead to solar flares is of vital importance to 
enhance current predictions of space weather phenomena. 

Current solar flare prediction models rely upon empiri¬ 
cal observations, with many predictions based on tracking 
the properties of solar active regions (e.g., Gallagher et al. 
2002; Barnes et al. 2007; Falconer et al. 2011; Ahmed et al. 
2013; Balasubramaniam 2013). Since the magnetic active 
regions are the source of the magnetic energy released in 
the flare, this approach is well-justified. However, other ob¬ 
servable properties may also provide a diagnostic for the 
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underlying physical conditions in the solar corona leading 
to flares. In particular, we present evidence for two easily 
observable soft X-ray measurements that diagnose the non¬ 
flare magnetic energy and coronal temperature (in Section 
2). The parameter-space of these observables distinguishes 
properties of solar flares of different classes, as described in 
Section 3. Machine learning techniques are used to build 
classification models applied to historical flares in Section 4. 
Einally, the results from the statistical soft X-ray analysis 
of solar flares are discussed in Section 5. 

2. X-ray Flare Data 

We analyzed the historical X-ray data from NCAA’s 
GOES X-Ray Sensor (XRS). The GOES observations in¬ 
clude X-ray flux measurements averaged over every 1 minute 
observed in both a short-wavelength and long-wavelength X- 
ray band (short: 0.5-4 A and long: 1-8 A) from 1986 - 2014. 
The NCAA flare lists include nearly 50,000 X-ray flares in 
this timespan which covers solar cycles 22-24. The flare 
classifications fall into the following classes based on the 
peak flux level: X (> IQ-^Wm-^), M {> 10“® Wm-^), C 
(> 10“®Wm“^), and B (> lO'^Wm”^). Figure 1 shows 
the distribution of the flares of each type by year. The ma¬ 
jority of the stronger flares occur close to solar maximum, 
indicated in the plots by the maxima in sunspot number 
(obtained from the Solar Influences Data Analysis Center in 
Belgium). 

The NCAA flare lists include the start, peak, and end 
time of the flares along with the flare location, if known, 
and X-ray class. For our analysis, we use the start and 
peak time along with the flare class. The start of a flare 
is defined as when four consecutive one-minute 1-8 A flux 
measurements meet all of the following conditions: (1) All 
four values > 10“^Wm“^, (2) each consecutive measure¬ 
ment has a higher flux than the previous measurement, and 
(3) the last value is > 1.4x the measurement from three 
minutes earlier. 

Using the downloaded XRS data in both the short and 
long bands, we measured the long X-ray background flux 
{B) and the ratio of the short to long bands: 

^°'5-4A"B . (1) 

Ei_ 8 Ang 

A full description of our method to determine B is included 
in Winter and Balasubramaniam [2014], where we show that 
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the long X-ray background, but not the short X-ray back¬ 
ground, varies along with the solar cycle for solar cycles 22- 
24. This soft X-ray background variation was also observed 
for solar cycle 21 by Wagner [1988] and Aschwanden [1994]. 
The background is measured as the minimum 1-8 A flux 
in the preceding 24 hours for each 1-minute XRS measure¬ 
ment, following the procedure of Hock et al. [2013]. This 
background measurement is similar to the X^io index, used 
in operational forecasting as an integrated irradiance proxy 
by Tobiska and Bouwer [2006]. The ratio R is computed 
from the start time of each flare until the peak, using the 
dates from the NOAA flare lists. In a small number of cases, 
~ 2% of the total flares, inaccuracies in the flare list show a 
start time that occurs after the peak time. These cases were 
not considered in the final analysis. The maximum R value, 
Rmax, was next computed for each flare. Figure 2 shows an 
example of the X-ray flux, X-ray background level, and R 
for a flare. In computing Rmax, we required that more than 
one R measurement must exist and that both the short and 
long X-ray flux > 10“® W m~^. These criteria excluded 22% 
of the total flares from further analyses with the Rmax pa¬ 
rameter, including near-instantaneous flares with short (~ 1 
minute) rise times. 

Table 1 includes the average statistics for each flare class. 
These statistics include the total number of flares, time be¬ 
tween Rmax and 1-8 A peak flux (where measurements were 
possible following the criteria from the previous paragraph), 
peak flux, background flux in the short and long band, and 
Rmax- Since the ratio R is related to coronal temperature 
(see, e.g., Thomas et al. 1985; Garcia 1994; Feldman et al. 
1996; White et al. 2005; Ryan et al. 2012), the Rmax value is 
related to the maximum temperature occurring during the 
flare, where: 

Tmax = Aq + AlRmax + ^sRmaxMK. (2) 

An are coefficients found in Table 2 of White et al. [2005]. 
We note that this direct assumption is a simplification 
based on an isothermal flare. As discussed in White et al. 
[2005], more extensive investigations of the multi-thermal 
flare properties are not possible with the two flux measure¬ 
ments provided with the GOES XRS but the ratio is still 
a useful tool in studying the overall energetics of the flares. 
We find that the maximum ratio is significantly higher for 
the strongest flares (e.g., 0.33 for X flares and 0.05 for B 
flares). As shown in the table, this maximum temperature, 
on average, occurs before the peak in the 1-8 A flux, in part 
due to the fact that the 0.5-4 A flux peaks ahead of the 1- 
8 A flux by up to 20 minutes. The rise time from flare onset 
to Rmax is longer for X and M flares than C and B flares, 
consistent with statistical results presented by e.g., Veroniq 
et al. [2002]. 

We find that the average long-wavelength background 
flux is higher for the stronger flares (X and M) than the 
weaker flares (C and B). This is also shown in Figure 3, 
with contour plots of the multivariate density estimates for 
the peak flux and background flux. The density estimates 
are created with the kernel density method, using gaussian 
filters, through the scientific Python (scipy) gaussian_kde 
function [Scott, 2009]. A linear correlation is found between 
the peak flux and background in the long-wavelength band 
with 

log B = (1.03 ± 0.01) X log E " -f (0.86 ± 0.04) (3) 

peak,l—8A 

and a correlation coefficient R^ = 0.45. This is particularly 
evident in the full sample of flares, whose statistics are dom¬ 
inated by the more numerous B- and C-class flares (shown in 
green). However, the trend does not exist when examining 
the distribution of M- and X-class flares alone (shown in red, 
R^ = 0.07). At high peak flux levels, there is no difference 


between the background levels (this is discussed again in the 
following section). Similarly, we computed density plots for 
the short-wavelength peak and background. Given the issue 
of the 0.5-4 A background being close to or below the in¬ 
strumental limit, we chose to examine flares occurring from 
1999-2006, a time period with higher measured backgrounds 
that includes the rise through fall phases of solar cycle 23 
solar maximum. No correlation exists between peak flux 
and background in the short-wavelength band (R^ < 0.01). 
This further illustrates that the background in the long- 
wavelength band and not the short-wavelength band is an 
appropriate observational parameter that is tied to the peak 
flare flux. 

3. Separation into Solar Flare Classes 

With the extensive database of X-ray flare properties, 
we investigated whether measurements based on the 1-min 
GOES XRS observations showed properties useful for pre¬ 
dicting the X-ray class. Specifically, in order to build a 
classification model based on the properties of past flare 
events, we identified properties that lend themselves to the 
use of classification techniques by showing a separation be¬ 
tween different flare classes in their parameter space. The 1- 
8 A non-flare background and the Rmax parameters yielded 
such a separation, shown in Figure 4. For each of the flare 
classes, the contours represent the parameter space of the di¬ 
agram including 50%, 68% (the 1-sigma contour level), and 
85% of the flares in the given class. The X-class flares occupy 
the right corner of the diagram, indicating high background 
flux and high Rmax- The M-class flares share a similar range 
of background flux, but with lower Rmax than the X-class 
flares. This is also evident in Table 1, where the average 
and standard deviation in B is consistent between X and M 
flares while the average Rmax is significantly higher for the 
X-class. Similarly, C-class flares share the range of back¬ 
grounds with X- and M- class flares, but have lower Rmax 
values. However, the B-class flares have significantly lower 
measurements of the background flux, with a similar range 
of Rmax to the C-class flares. Due to the upper flux limit 
of B-class flares (< 10“® Wm“^), they can not be observed 
when the background is high. 

To further test the apparent difference in the B 
and Rmax parameters of different flare classes, we com¬ 
puted Kolmogorov-Smirnov statistics [Kolmogorov, 1933; 
Smirnov, 1948]. The Kolmogorov-Smirnov two sample test 
determines the probability of two samples being drawn from 
the same distribution. To do this, the cumulative distribu¬ 
tion of each sample is computed and the maximum distance 
between the two chosen distributions is determined as the 
Kolmogorov-Smirnov (KS) statistic. When the KS statistic 
is small and the two-tailed p-value approaches 1, the null 
hypothesis of both samples being drawn from the same dis¬ 
tribution can not be rejected. The two sample KS test was 
run using the scipy ks_2sainp function [Oliphant, 2007]. The 
test was run on all combinations of two flare classes (e.g., 
sample 1 as X-class flares and sample 2 as M-class flares) 
for each of the parameters B and Rmax. Results are given 
in Table 2, including the KS statistic and the two-tailed p- 
value. These statistics show that the long X-ray background 
is consistent with being drawn from the same population 
for X- and M-class flares. To a lesser degree, the KS statis¬ 
tic is low (~ 0.3) for comparisons between the distribution 
of background flux of X- and M-class flares with C- class 
flares, but the low p-values (< 0.001) indicate that these 
distributions are distinct. Additional comparisons between 
the distributions of B and Rmax for the flare classes result 
in high KS statistics (from 0.395 — 0.985) and low p-values 
(< 0.001). This suggests that the distributions of values in 
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the -B-Rmax parameter space are distinct. This is also evi¬ 
dent in Figure 4, where we show that the majority of flares 
of each class (e.g., the 1-sigma or 68% contour level) occupy 
a distinct parameter space. 

This separation into flare classes hints at differences in 
the physical conditions of the solar corona. The long X-ray 
background, B, is the non-flare flux level associated with 
active regions. It can be construed as a proxy for the mag¬ 
netic energy of the corona. The Rmax measurement is asso¬ 
ciated with coronal temperature, as well as radiative losses 
and emission measure (see, e.g., Thomas et al. 1985; Garcia 
1994; Feldman et al. 1996; White et al. 2005; Ryan et al. 
2012). A possible explanation for the separation of flare 
classes in the B-Rmax is that the built-up energy of the re¬ 
gions that produce the flares (measured by B) is directly 
related to the amount of energy released in the flare (mea¬ 
sured by Bmax). Since the background measurement is an 
average over the entire Sun and not just the flare site, we 
expect that a more careful analysis where B is replaced by a 
measurement of the energy/flux of the flare site alone would 
reveal a tighter correlation with Bmax. This, however, is a 
more difhcult measurement to make for a real-time forecast 
situation. 

4. Machine Learning Classification 

To quantify the separation into flare classes shown in the 
B-Rmax parameter space, we used machine learning clas¬ 
sification techniques. Specifically, we used the K-nearest 
neighbors and nearest centroid algorithms from the Python 
machine learning library, scikit-learn [Pedregosa et al, 
2011]. These classifiers build predictions using input data 
from a training set of data with known classes. Our train¬ 
ing set included X-ray flares from solar cycles 22-24. The 
input parameters were the X-ray background and the max¬ 
imum ratio of short to hard X-ray flux (Bmax), with the 
classes labeled as X, M, C, B. The classifier algorithms then 
use the training set to predict what the class of a new flare 
event will be. In § 4.1, the machine learning classification 
techniques are described. The machine learning algorithms 
applied to the input data create what is termed a model, 
which is a statistical model based on training data that is 
used to make predictions for new data sets. Results of our 
analysis applying our statistical models are included in § 4.2. 

4.1. Statistical Model Descriptions 

For the K-nearest neighbor classifier, the parameter space 
of the logarithm of X-ray background vs. the logarithm of 
Bmax is broken up into a grid. Along each point in the grid, 
the number of data points of each possible class in the k near¬ 
est neighboring points are counted, where fc is a user-defined 
integer. Whichever class corresponds to the most neighbor¬ 
ing data points is adopted in the model as the likely class of 
any new flares with the same values of the X-ray background 
and Bmax at that grid point. For our analysis, the grid size 
was 0.01 and the k neighboring points used was 5. As an 
example of computing a classification for one grid point, we 
consider the point where log Bmax = —0.8 and log of the 
X-ray background = —8. Figure 4 shows that the 5 nearest 
data points to the selected point include 4 B-class flares and 
1 C-class flare. Therefore, an unknown flare at the selected 
point would be classified as a B-class flare. 

Alternatively, the nearest centroid algorithm uses the 
distance from the centroid of the distribution of points 
in each class of the training set for predictions. For ex¬ 
ample, we consider the classification of a point based on 
the distance from the centroid of the X-class flares (log 
Bmax = —0.48 and log X-ray background = —5.92) and 
M-class flares (log Bmax = —0.72 and log X-ray back¬ 
ground = —5.92). To classify an unknown flare with log 
Bmax = —0.5 and log X-ray background = —5.9, we cal¬ 
culate the Euclidean distance of the point to the centroid 


of each of the classes. The selected point is a distance of 
^(-0.5 - -0.48)2 4- (-5.9 - -5.92)2 = o.03 from the X- 
class centroid and, similarly computed, 0.22 from the M- 
class centroid. Since it is closer to the X-class centroid, the 
selected point is classified as an X-class flare. 

Since the K-nearest neighbor approach weights according 
to the number of points along the grid, it is more accurate for 
classifying C- and B- class flares, which include 4-lOOx more 
flares than the M- and X- class flares. Meanwhile, since it is 
based solely on the distance from the centroid of the param¬ 
eters for a given class, the nearest centroid method does a 
better job at classifying the X-class flares. Using these ma¬ 
chine learning methods, classification models were built with 
the B and Bmax measurements from flares from solar cycles 
22-24. Since there are relatively few, less than 300, X-class 
flares in the entire sample spanning nearly three decades, 
the advantage of using the cycle 22-24 data as a training 
set is that the resultant model will include as many X-class 
flares as possible. These statistical models built with the 
full flare dataset are shown in Figure 5. 

To create statistical models that do not include the train¬ 
ing set data, but still include a larger number of X-class 
flares, we also built models using only the flare parameters 
from solar cycles 22 and 23. These models were then used 
to predict the solar cycle 24 flare classifications. A concern 
with using this approach is whether the flare behavior during 
the much weaker solar cycle 24 is different from the flares in 
cycles 22 and 23 that were used to create the model. To test 
whether there are differences in flare rate between the so¬ 
lar cycles, we determined the occurrence frequency rate as a 
function of 1-8 A peak flux during the rise to solar maximum 
and solar maximum phases for each of the solar cycles. The 
occurrence frequency distributions were fit with a power-law 
of the form 'N{Fi-sAng) = Ni x {Fi-8Ang)~°''^^, where N is 
the occurrence frequency (flares rate/day), FisAng is the 
logarithm of flare peak flux (W m“^), and the fit parameters 
include the normalization factor, A^i, and the power-law in¬ 
dex, a. We utilize the Levenberg-Marquardt least-squares 
minimization technique [Levenberg, 1944; Marquardt, 1963] 
to determine the best-fit function parameters, fitting the 
occurrence rate where peak flux > 10“®Wm“^. We omit 
the B-class flares from these fits since the low end of the 
power-law distribution is near the GOES detection thresh¬ 
old. Goodness of fit is assessed with the statistic, defined 
as = S(data — model)^/std^, where the data are the fre¬ 
quency distribution values (N), the model is the power-law 
fit, and std is the standard deviation for each of the mea¬ 
surements of N. Good fits are those where x^/dof are close 
to unity, where dof are the degrees of freedom or number of 
data points - number of free parameters that are fit. 

Using the solar X-ray background analysis from Winter 
and Balasubramaniam ]2014], the rise phases are defined as 
occurring from 08/1986 - 08/1988 (solar cycle 22), 05/1996 - 
05/1998 (solar cycle 23), and 12/2008 - 12/2011 (solar cycle 
24) and the solar maximum phases are defined as 08/1988 - 
08/1991 (solar cycle 22), 05/1999 - 05/2003 (solar cycle 23), 
and 12/2011 - 12/2014 (solar cycle 24, noting that this is an 
incomplete solar maximum phase including flares up until 
the end of the period examined in the paper). Plots of the 
frequency occurrence rates are included in Figure 6 and the 
best-fit parameters are in Table 3. Aschwanden [2011] in¬ 
cludes power-law estimates from past analyses of frequency 
distributions of X-ray flares, which find a to range from 
1.58 - 2.0. They find that the slopes change throughout 
the solar cycle, with flatter slopes during solar maximum. 
Our rates are consistent with these values. Additionally, as 
in Aschwanden [2011], we And power-law slopes are simi¬ 
lar during the same phase (e.g., solar maximum), but have 
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different normalizations. Therefore, we conclude that the 
power-law slope of the frequency rates for flares are consis¬ 
tent between solar cycles and as a result we can effectively 
utilize the solar cycle 24 flares as an appropriate test set for 
classification models built with the solar cycle 22-23 flare 
parameters. 

4.2. Results 

Table 4 presents statistics on the percent of correct identi¬ 
fications (PC), the number of true classifications (TC; num¬ 
ber of flares where the correct flare class is predicted) divided 
by the total number of flares (N) examined. The PC compu¬ 
tations show the ability of the models, built with solar cycle 
22-24 and solar cycle 22-23 flare parameters, to correctly 
classify flares from the test sets of flare parameters, solar 
cycle 22-24 and 24 flares. The models built with the solar 
cycle 22-24 data correctly classify ~ 90% of the flares with 
the K-nearest neighbor model and ~ 75% with the nearest 
centroid model. The nearest centroid model correctly classi¬ 
fies 95.9% of X-class flares with the solar cycle 22-24 tested 
model and correctly classifies all of the solar cycle 24 X- 
class flares. The K-nearest neighbor model better predicts 
the M-, C-, and B- class flares, correctly classifying 66.6% of 
M-class flares, 91.8% of C-class flares, and 89.1% of B-class 
flares from solar cycles 22-24. The model does a better job 
of classifying the solar cycle 24 flares, with correct classifi¬ 
cations of 80-90% of M through B flares. From the tests of 
the solar cycle 22-24 and solar cycle 22-23 built models, the 
performance is similar in correctly identifying solar cycle 24 
flares, but the classifications are slightly better for the solar 
cycle 22-24 built models (by ~ 5% overall) since the training 
set includes the solar cycle 24 flares. 

Additional skill scores were computed to better quantify 
the results, shown in Table 5. These skill scores include the 
probability of detection (POD), false alarm rate (FAR), Hei- 
dke skill score (HSS; see Heidke 1926), and true skill score 
(TSS; defined in Hanssen and Kuipers 1965). For each flare 
class, the following values were computed: the true classifi¬ 
cations (TC), false null classifications (FN; number of flares 
in the class incorrectly predicted not to be in the flare class), 
false classifications (FC; number of flares not in the class in¬ 
correctly predicted to be in the flare class), and the true 
null classifications (TN; number of flares not in the flare 
class and correctly predicted not to be in the flare class). 
These definitions are similar to those defined in forecasting 
solar energetic particle events and solar flares (recent exam¬ 
ples include Laurenza et al. 2009 and Bloomfield et al. 2012). 
Using these definitions: 

POD = TC/(TC + FN), 

FAR = FC/(TC-hFC), 

vrcq _ 2x[(TCxTN)-(FNxFC)] , 

iioo (tC-|-FN)(FN-|-TN) + (TC-I-FC){FC+TN) ’ 

rpqq _ TC _ FC 

TOO (tC-I-FN) (FC+TN) • 

The POD values indicate that, as shown with the PC 
statistic, the nearest centroid model correctly predicts the 
X- and M-class flares better than the K-nearest neighbor 
model. However, the FAR shows that the K-nearest neigh¬ 
bor model makes fewer false predictions of a flare incorrectly 
being classified in the X- or M-class. For the X-class flares, 
for instance, even though all of the X-class flares are cor¬ 
rectly classified, the FAR is high with the nearest centroid 
model since there are 567 non-X-class flares incorrectly clas¬ 
sified as being in the X-class. From a forecast stand point, 
this argues that both a combination of the K-nearest neigh¬ 
bor predictions, which have low FAR, and the nearest neigh¬ 
bor predictions, which have high POD, would be necessary 
for making solar flare predictions. For the C- and B-class 
flares, the combination of low FAR and high POD for the 
K-nearest neighbor model proves it to be the superior model 
for predicting lower flux flares. 


The final statistics, HSS and TSS, are commonly used 
skill scores with an advantage over the PC, POD, and FAR 
statistic in that they incorporate all of the parameters TC, 
FN, FC, and TN. By using all combinations, HSS takes into 
account the expected number of correct identifications due 
to chance. An advantage of TSS over HSS, as pointed out 
in Bloomfield et al. [2012], is that the TSS does not change 
depending on the number of flares in the sample size. The 
results of the TSS show the nearest centroid model as the 
best performer for the X-class flares, while the K-nearest 
neighbor is better for the weaker flares. The HSS roughly 
agree with TSS, with the exception of the X- and M-class 
flares in the nearest centroid model. The lower values of 
HSS are likely due to the smaller number of flares in these 
categories compared to the number of false predictions (for 
instance, there are 196 X-class flares, but 567 non-X-class 
flares were incorrectly predicted as X-class). 

5. Discussion 

Many of the previous studies of the properties of solar 
flares rely upon detailed analyses of a single or small group 
of flares. However, statistical analyses of large samples of 
flares offer new insights into the physical properties associ¬ 
ated with these events. For instance, studies of the GOES X- 
ray light curves by Aschwanden and Freeland [2012] tested a 
theoretical model (fractal-diffusive self-organized criticality) 
for flare generation, finding that nano flares are not likely to 
play a major role in flare heating, and Ryan et al. [2012] built 
upon previous studies to refine peak temperature and emis¬ 
sion measure statistics. In this paper, we present a new way 
to separate solar flares into the NO A A flare classes based on 
a statistical analysis of the GOES X-ray observations of the 
~ 50,000 flares occurring from 1986 - mid-2014. 

These flare classification predictions are based upon ob¬ 
served X-ray properties - the 24-hour non-flare X-ray back¬ 
ground in the 1-8 A band and the maximum ratio of the 
short to long band flux during the flare. These parameters 
reveal a separation between the X-, M-, G-, and B- class 
flares. The separation was quantified and verified through 
machine-learning algorithms and skill score statistics, ap¬ 
plied to the solar flare parameters from solar cycles 22-24. 

The Rmax parameter is related to the maximum temper¬ 
ature of the flare. Using the relations and constants from 
White et al. [2005], we find that the maximum temperatures 
of flares range from ~ 16 - 49MK (X-class), 6-15 MK (M- 
class), 4-6 MK (G-class), and ~ 4 - 11 MK (B-class). These 
results are consistent with recent analyses, e.g., peak tem¬ 
perature from GOES presented in Ryan et al. [2012]. The 
maximum temperature is reached from a few minutes be¬ 
fore to up to 25 minutes after the start of the flare. The 
stronger the flare, the more potential warning time we may 
have to predict the peak. For example average Bmax for X- 
flares occurs 6.7 minutes before the peak while the time for 
G and B class flares is much shorter at ~ 3 minutes. One 
potential challenge to applying this technique for real-time 
flare forecasting is that the predictions are made after the 
maximum is reached. With XRS observations of 1 minute 
resolution, as used in the current analysis, the short warn¬ 
ing time is significantly decreased. In this case, however, 
the predictions are still useful in determining that the flare 
will be entering the declining phase. Also, NOAA SWPC 
currently provides finer time resolution XRS archival obser¬ 
vations of a few seconds cadence. Accessing these data in 
real-time would greatly enhance the warning time available 
from our technique. 

While the maximum temperature is an indicator, like the 
peak flux of the flare, of the energy release, the non-flare 
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background is less straightforward to interpret. This pa¬ 
rameter is the integrated X-ray flux of the Earth-facing Sun, 
during the lowest flux period in 24-hours preceding the flare. 
Integrated X-ray flux is dominated by active regions (e.g., 
Acton 1996 showed that more than 50% of the coronal lu¬ 
minosity is associated with 2% of the solar surface). The 
non-flare background is therefore a measure of the active 
regions, or areas of enhanced coronal heating. In past stud¬ 
ies of X-ray imaging from SOHO and Yohkoh and full-disk 
magnetograms, Fisher et al. [1998] showed that X-ray lumi¬ 
nosity is highly correlated with the active region’s unsigned 
magnetic flux, with Lx ~ Tan et al. [2007] confirmed 

this in a study of 160 active regions and also found a strong 
correlation with the magnetic energy dissipation (also found 
earlier by Abramenko et al. 2006), which they claim could be 
showing the importance of photospheric turbulent motions 
to heating of the corona above active regions. Therefore, 
the X-ray non-flare background flux is an indicator of the 
average magnetic energy of the active regions as well as the 
turbulent energy of the photosphere below these active re¬ 
gions. The phase-separation between strong flares and weak 
flares is guided by the magnetic energy available to produce 
flares. Higher turbulence or stored magnetic energy leads to 
more energy release in the flare, measured by the maximum 
flare temperature. 

Since the non-flare background is measured in the 24 
hours preceding the flare, it can be used for flare predic¬ 
tions in advance of the flare. One way the background can 
be used in real-time forecasting is as a threshold for pre¬ 
dicting when strong flares may or may not occur. This is 
possible due to the large separation in the range of back¬ 
ground values of strong flares (M and X class) versus the 
weakest flares (B class). From analysis of the distributions 
of the background for the different flare classes, we find that 
at the -2a level (the 2.28th percentile) for M and X class 
flares the background is 1.6 x 10“^ W m~^. Therefore, there 
is a low probability of a background flux below this level 
being associated with an X or M class flare. For C, M, and 
X class flares, the -2a level is 1.07 x 10“^Wm“^, meaning 
anything lower than this background flux is unlikely to be 
associated with a strong flare. 

Based on these X-ray background results, during solar 
minimum when the background flux is low we expect no X- 
or M-class flares. For instance, the results from Figure 1 
of Winter and Balasubramaniam [2014] show that solar cy¬ 
cle 24 had average 2-week X-ray background measurements 
below 1.07 X 10~^ W m“^ for the first two years of the cy¬ 
cle. This means only B-class flares would have been ex¬ 
pected in these years (2009-2011). From Figure 1, it is clear 
that relatively few strong flares had occurred during this 
time. During these two years, the NOAA list records 816 
flares, including no X-class flares, 13 M-class flares (1.6% of 
flares), 103 C-class flares (12.6% of flares), and 700 B-class 
flares (85.8% of flares). Since this is a rough estimate of 
the background based on 2-week averages, we expect that 
the background occasionally rose above this low threshold, 
accounting for the small number of M-class flares observed 
during the last solar minimum. These flare rates during 
the beginning of solar cycle 24 are similar to those in the 
first two years of solar cycles 22 and 23 for X- and M- class 
flares, but, likely due to the higher backgrounds in cycles 
22 and 23, there are more C-class and fewer B-class flares 
in cycles 22 and 23. For comparison, the beginning of solar 
cycle 22 had 1460 flares (2 X-class, 41 M-class, 410 C-class, 
and 1007 B-class flares from 1986-1988) and solar cycle 23 
had 914 flares (4 X-class, 18 M-class, 262 C-class, and 630 
B-class flares from 1996-1998). 

Additional investigation into the relationship between 
temperature and non-flare X-ray background will lead to 
refinement of our flare predictions. Future work will investi¬ 
gate how the separation is affected by choice of the non-flare 
background. For instance, for forecasting purposes the goal 


is to have a longer lead time before the flare occurs. Differ¬ 
ent binning periods for non-flare background can be tested 
to determine how much lead time is possible while still pre¬ 
serving the phase-separation between weak and strong flares. 

We will also look into increasing the warning time for flare 
class predictions, by determining whether other observables 
help us determine what the Rmaa:/peak value will be further 
in advance. One such possible path is through determining 
characteristic flare shapes for the rise time, which can be 
used at the start of the flare to predict when the maximum 
will occur. 
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Figure 1. The distribution of flares of each type in the 
NO A A flare list, including observations from 1986 - July 
2014. Gray lines trace the monthly sunspot number. The 
majority of X, M, and C class flares occur close to solar 
maximum, the maximum in sunspot number. 



Figure 2. For the ~50,000 flares in the NOAA flare lists, 
occurring from 1986 - present, we calculated the maxi¬ 
mum ratio of the short (0.5-4 A) to long (1-8 A) X-ray 
bands. An example is shown for an X-class flare from 
2011. The ratio R (red), long wavelength (gray), and 
short wavelength (black) X-ray flare profiles are shown, 
with the maximum in R marked with a red dashed line 
and the 1-8 A flare peak marked with a black dashed 
line. The dot-dashed line marks the 1-8 A non-flare back¬ 
ground, B. 
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Figure 3. Kernel density estimates showing the two- 
dimensional probability distribution of peak flux and 
background flux in both the long-wavelength (top) and 
short-wavelength (bottom) X-ray bands. The green con¬ 
tours show the density distributions for all flares, which 
are dominated by the more numerous C- and B-class 
flares, while the red contours show the density distri¬ 
butions for X- and M- class flares. There is no cor¬ 
relation between short-wavelength peak flux and back¬ 
ground, while there is a positive correlation in the re¬ 
lationship between long-wavelength peak flux and back¬ 
ground {R^ = 0.45, see § 2 for details). 
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Figure 4. The observational parameters of 1-8 A non- 
flare X-ray background flux and the maximum ratio of 
the 0.5-4 A/1-8 A flux (Rmax) separate the NOAA flares 
effectively into different parameter space based on the 
peak flux. The left panel shows a scatter plot of the 
measured parameters of the 50,000 NOAA flares (with 
color-coding corresponding to flare class as blue = X, red 
= M, green = C, and yellow = B). In the right panel, 
contour levels display levels enclosing 50% (solid line), 
68% (dashed line), and 85% (dashed dotted line) of the 
X- (blue), M- (red), C- (green), and B- (yellow) class 
flares. Lower peak flux occurs when the background is 
also low. High peak flux occurs when the background 
flux and Rmax parameters are high. 
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Figure 5. Classification models built from the solar 
cycle 22 - 24 flare parameters for the K-nearest neighbor 
(left) and nearest centroid (right) algorithms. The K- 
nearest neighbor model classifications use the classes (X, 
M, C, B) of the 5 nearest points in the training set to each 
grid point (0.01 x 0.01) to predict the class of a flare with 
the B and Rmax values at that grid point. The nearest 
centroid model predicts classes based on the Euclidean 
distance each grid point is from the centroid of B and 
Rmax for each class (X, M, C, B). See the text for more 
detail on the classification algorithms. 
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Figure 6. Occurrence frequency distributions of the 
peak 1-8 A flux are shown for solar cycles 22, 23, and 
24 both during the rise to solar maximum and during 
solar maximum. Flare rates are similar between solar 
cycles for the rise to solar maximum and during solar 
maximum. 
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Figure 7. Results from classification models built with 
the solar cycle 22 and 23 flare parameters for the K- 
nearest neighbor (left) and nearest centroid (right) algo¬ 
rithms, applied to the solar cycle 24 data (points, color- 
coded as in Figure 4). The background shading indicates 
the model predicted class (color-coded as in Figure 5). 
The K-nearest neighbor algorithm correctly classifies 83% 
of the flares. However, the nearest centroid algorithm 
classifies the highest peak flux flares (X) with 100% ac¬ 
curacy compared to the 74% accuracy from the K-nearest 
neighbor model. 
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Table 1. X-ray Flare Statistics. 


Class 

N 

< T{P Rmax) > < > 

min 10“^Wm“^ 

<B • > 

0.5-4A 

10"® Wm-2 

<B • > 

1-8A 

10"® Wm-2 

-^max ^ 

X 

290 

6.7 ± 13.2 

2400 ± 1700 

1.9 ± 2.1 

1.2 ± 0.7 

0.33 ± 0.08 

M 

3742 

4.8 ± 9.9 

240 ±180 

1.6 ± 1.7 

1.2 ± 0.7 

0.18 ± 0.06 

C 

28803 

3.0 ± 7.0 

31 ± 20 

0.8 ± 1.0 

0.8 ± 0.5 

0.08 ± 0.04 

B 

15751 

2.7 ± 6.3 

4.9 ± 2.6 

0.2 ± 0.2 

0.2 ± 0.1 

0.05 ± 0.15 


The statistics include number of flares (N), average and standard deviation of the time between the 1-8 A Peak and ilmax 
(< T{P — Rmax) >), average and standard deviation of the peak flux (< >), average and standard deviation of the 

background 0.5-4 Aflux (< ^ >), average and standard deviation of the background 1-8 Aflux (< >), and 

average and standard deviation of the maximum ratio of 0.5-4 A/1-8 A(< 7?max >), for each flare class. 


Table 2. Kolmogorov-Smirnov Statistics for Two Sample Comparisons. 



B 


Rmax 

Classes 

KS 

P 

KS 

P 

X, M 

0.056 

0.600 

0.776 

0.000 

X, C 

0.298 

0.000 

0.976 

0.000 

X, B 

0.857 

0.000 

0.985 

0.000 

M, C 

0.283 

0.000 

0.706 

0.000 

M, B 

0.806 

0.000 

0.888 

0.000 

C, B 

0.695 

0.000 

0.395 

0.000 


The K-S statistic (KS) and probability value (p) are listed for tests on the distributions of the long X-ray background 
and Rmax for all combinations of comparisons of two X-ray flare classes. 


Table 3. Best-fit Parameters for Power-Law Fits to the Occurrence Flare Rates. 


Solar Cycle Phase 

Xi 


a 

xVdof 

22 Rise 

-6.65 

±0.29 

1.97 

±0.06 

33.2/35 

23 Rise 

-5.53 

±0.40 

1.69 

±0.08 

40.8/23 

24 Rise 

-6.46 

±0.34 

1.89 

±0.07 

34.7/29 

22 Maximum 

-5.96 

±0.15 

1.99 

±0.03 

74.7/47 

23 Maximum 

-7.03 

±0.16 

2.17 

±0.03 

28.7/46 

24 Maximum 

-6.84 

±0.20 

2.09 

±0.01 

28.6/41 


Occurrence frequency distribution for the rise phase (Rise, from the beginning of the solar cycle towards maximum) and 
during solar maximum (Maximum) were fit with a power-law model (see § 4.1 for details). The best-fit value and errors 
are shown for the normalization factor (Ni, the normalization factor for the logarithm of the 1-8 A peak flux in Wm“^) 
and the power-law index (a). The goodness of fit is assessed with the reduced statistic (x^ divided by the degrees of 

freedom, dof). 




WINTER & BALASUBRAMANIAM: SOLAR FLARE CLASS PREDICTIONS 


X - 13 


Table 4. Classification Model Statistics. 


Class 

N 

KNN PC 

NC PC 

N 

KNN PC 

NC PC 

N 

KNN PC 

NC PC 

Model 

22-24 

22-24 

22-24 

22-24 

22-24 

22-24 

22-23 

22-23 

22-23 

Test Set 

22-24 

22-24 

22-24 

24 

24 

24 

24 

24 

24 

All 

39391 

88.9 

75.0 

7032 

88.9 

73.7 

7032 

83.4 

73.0 

X 

196 

59.2 

95.9 

23 

73.9 

100 

23 

73.9 

100 

M 

2964 

66.6 

70.5 

349 

80.5 

53.6 

349 

75.6 

51.0 

C 

23425 

91.8 

72.9 

3987 

90.6 

73.7 

3987 

85.5 

72.8 

B 

12806 

89.1 

79.7 

2673 

87.4 

76.2 

2673 

81.4 

76.0 


Models were built using the K-nearest neighbor (KNN) and nearest centroid (NC) methods, using the solar cycle 22-24 
flare parameters. The number of flares in each category (N), along with the percent of correct classifications are shown 
for the model built and applied to the flare data (e.g., KNN PC is the percent correct for the KNN model). The solar 
cycles used to build each of the models are listed in the row labeled Model and the listed statistics are for testing the 

model on the solar cycles listed in the row labeled Test Set. 


Table 5. Skill Scores. 


Class 

N 

POD 

KNN 

FAR 

Model 

HSS 

TSS 

POD 

NC 

FAR 

Model 

HSS 

TSS 

X 

196 

0.59 

0.28 

0.65 

0.59 

0.96 

0.75 

0.39 

0.94 

M 

2964 

0.67 

0.22 

0.70 

0.65 

0.71 

0.66 

0.40 

0.60 

C 

23425 

0.92 

0.10 

0.78 

0.77 

0.73 

0.14 

0.53 

0.55 

B 

12806 

0.89 

0.11 

0.84 

0.84 

0.80 

0.19 

0.71 

0.71 


The number of flares (N), probability of detection (POD), false alarm rate (FAR), Heidke skill score (HSS), and true skill 
score (TSS) presented for the K-nearest neighbor (KNN) and nearest centroid (NC) models built with the solar cycle 
22-24 flare parameters and used to classify the same sample of flares. 




